Cross-domain robust acoustic training
نویسندگان
چکیده
This paper describes our efforts towards cross-domain acoustic training for Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We used weighted multi-style training by pooling insufficient telephony landline and cellular data with down sampled wide band clean data to develop better hybrid acoustic models. We explored the effects on decision tree size to accuracy by approximately 10%. The results show that by fixing number of parameters, system with smaller number of context dependent HMM states yields better accuracy. It leads to a smaller phone set design. We then investigated the performance degradation on two reduced phone sets for Spanish. Based on these studies, we are able to develop a hybrid system for 8KHz closing talking microphone, telephony landline and cellular phone environments. The acoustic model is evaluated on both flat grammars, digit and name at department, and language model tasks, ATIS and general dictation, using the IBM ViaVoice product engine.
منابع مشابه
Cross-Corpus Normalization Of Diverse Acoustic Training Data for Robust HMM Training
We investigate the use of heterogeneous data sources for acoustic training. One of the prominent problems in modelling the process of speech communication is that of variability. This problem is further exaggerated by the use of diverse acoustic data. We describe an acoustic normalization procedure for enlarging an ASR acoustic training set with out-of-domain acoustic data. A larger in-domain t...
متن کاملInvariant Representations for Noisy Speech Recognition
Modern automatic speech recognition (ASR) systems need to be robust under acoustic variability arising from environmental, speaker, channel, and recording conditions. Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training. We attempt to address this problem by encouraging the ne...
متن کاملDomain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings
Adaptation of Automatic Speech Recognition (ASR) systems to a new domain (channel, speaker, topic, etc.) remains a significant challenge, as often, only a limited amount of target domain data for adaptation of Acoustic Models (AMs) is available. However, unlike GMMs, to date, there has not been an established, efficient method for adapting current state-of-theart Convolutional Neural Network (C...
متن کاملLarge-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR
Recently, it was shown that the performance of supervised timefrequency masking based robust automatic speech recognition techniques can be improved by training them jointly with the acoustic model [1]. The system in [1], termed deep neural network based joint adaptive training, used fully-connected feedforward deep neural networks for estimating time-frequency masks and for acoustic modeling; ...
متن کاملImproving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization
The success of deep neural network (DNN) acoustic models is partly owed to large amounts of training data available for different applications. This work investigates ways to improve DNN acoustic models for Bluetooth narrowband mobile applications when relatively small amounts of in-domain training data are available. To address the challenge of limited indomain data, we use cross-bandwidth and...
متن کامل